16. Categorical Features & Feature Column API
Categorical Features & Feature Column API
ND320 AIHCND C01 L03 A13 Transform Preprocess Categorical Features With Feature Column API
TensorFlow Feature Column API for Categorical Features Key Points
For building categorical features with the TensorFlow Features Column API, we follow the following steps:
- Select the categorical feature columns.
- Create a vocabulary text file with all of the unique values for a given categorical feature and add a placeholder value for out of vocabulary(OOV) values in the first row.
- Note: For columns with a small number of features, you probably don't need to do this step and can instead pass an array/list to the categorical_column_with_vocabulary_list function.
- The creation of a separate vocabulary file method is particularly great for features with high cardinality. It can also allow you to use other tools like SQL to generate these vocab files with more massive datasets and decouple this process if you are already creating these for data profiling purposes.
- Create your new feature by passing the vocabulary feature to the final derived feature. This can be an embedding or one-hot encoded feature. In this example, we created a one-hot encoding feature with the indicator column function.
Example:
principal_diagnosis_vocab = tf.feature_column.categorical_column_with_vocabulary_file(
key="PRINCIPAL_DIAGNOSIS_CODE", vocabulary_file = vocab_files_list[0], num_oov_buckets=1)
Additional Resources
Code
If you need a code on the https://github.com/udacity.
Categorical Feature Columns
SOLUTION:
- `categorical_example_df['PRINCIPAL_DIAGNOSIS_CODE'].nunique()` = `3929` Use a vocab file!
- When you build your TF dataset, you should pass a predictor field which is also know as the label.